摘要 :
Singular value decomposition (SVD) is a widely used tool in data analysis and numerical linear algebra. Computing truncated SVD of a very large matrix encounters difficulty due to excessive time and memory cost, In this work, we a...
展开
Singular value decomposition (SVD) is a widely used tool in data analysis and numerical linear algebra. Computing truncated SVD of a very large matrix encounters difficulty due to excessive time and memory cost, In this work, we aim to tackle this difficulty and enable accurate SVD computation for the large data which cannot be loaded into memory. We first propose a randomized SVD algorithm with fewer passes over the matrix. It reduces the passes in the basic randomized SVD by half, almost not sacrificing accuracy. Then, a shifted power iteration technique is proposed to improve the accuracy of result, where a dynamic scheme of updating the shift value in each power iteration is included. Finally, collaborating the proposed techniques with several accelerating skills, we develop a Pass-efficient randomized SVD (PerSVD) algorithm for efficient and accurate treatment of large data stored on hard disk. Experiments on synthetic and real-world data validate that the proposed techniques largely improve the accuracy of randomized SVD with same number of passes over the matrix. With 3 or 4 passes over the data, PerSVD is able to reduce the error of SVD result by three or four orders of magnitude compared with the basic randomized SVD and single-pass SVD algorithms, with similar or less runtime and less memory usage.
收起
摘要 :
Inconsistencies in a database can be detected based on violations of integrity constraints, such as functional depencies (FDs). In big data era, many related data sources give us the chance of detecting inconsistency extensively. ...
展开
Inconsistencies in a database can be detected based on violations of integrity constraints, such as functional depencies (FDs). In big data era, many related data sources give us the chance of detecting inconsistency extensively. That is, even though violations do not exist in a single data set D, we can leverage other data sources to discover potential violations. A significant challenge for violation detection based on data sources is that accessing too many data sources introduces a huge cost, while involving too few data sources may miss serious violations. Motivated by this, we investigate how to select a proper subset of sources for inconsistency detection. To address this problem, we formulate the gain model of sources and introduce the optimization problem of source selection, called SSID, in which the gain is maximized with the cost under a threshold. We show that the SSID problem is NP-hard and propose a greedy approximation approach for SSID. To avoid accessing data sources, we also present a randomized technique for gain estimation with theoretical guarantees. Experimental results on both real and synthetic data show high performance on both effectiveness and efficiency of our algorithm.
收起
摘要 :
Inconsistencies in a database can be detected based on violations of integrity constraints, such as functional depencies (FDs). In big data era, many related data sources give us the chance of detecting inconsistency extensively. ...
展开
Inconsistencies in a database can be detected based on violations of integrity constraints, such as functional depencies (FDs). In big data era, many related data sources give us the chance of detecting inconsistency extensively. That is, even though violations do not exist in a single data set D, we can leverage other data sources to discover potential violations. A significant challenge for violation detection based on data sources is that accessing too many data sources introduces a huge cost, while involving too few data sources may miss serious violations. Motivated by this, we investigate how to select a proper subset of sources for inconsistency detection. To address this problem, we formulate the gain model of sources and introduce the optimization problem of source selection, called SSID, in which the gain is maximized with the cost under a threshold. We show that the SSID problem is NP-hard and propose a greedy approximation approach for SSID. To avoid accessing data sources, we also present a randomized technique for gain estimation with theoretical guarantees. Experimental results on both real and synthetic data show high performance on both effectiveness and efficiency of our algorithm.
收起
摘要 :
The theory of detection and repair of cycle-slip by combination of BeiDou triple-frequency observations has been described. The optimization principles of triple-frequency observations, and the threshold of cycle-slip, and the rel...
展开
The theory of detection and repair of cycle-slip by combination of BeiDou triple-frequency observations has been described. The optimization principles of triple-frequency observations, and the threshold of cycle-slip, and the relationship between the STD and success rate of cycle-slip fixed have been discussed. In consideration of ionospheric delay, selecting the phase/pseudorange combination that is suitable for BeiDou triple-frequency and utilizing measured data to calculate the estimation of combined cycle-slip. Results show that the amount of STD of cycle-slip calculated by three-stage method is smaller than pseudorange/phase combination, ensuring the accuracy of cycle-slip detection and repair; and the workload of detection and repair of cycle-slip by three-stage method has been decreased compared with geometry-free phase combination.
收起
摘要 :
The theory of detection and repair of cycle-slip by combination of BeiDou triple-frequency observations has been described. The optimization principles of triple-frequency observations, and the threshold of cycle-slip, and the rel...
展开
The theory of detection and repair of cycle-slip by combination of BeiDou triple-frequency observations has been described. The optimization principles of triple-frequency observations, and the threshold of cycle-slip, and the relationship between the STD and success rate of cycle-slip fixed have been discussed. In consideration of ionospheric delay, selecting the phase/pseudorange combination that is suitable for BeiDou triple-frequency and utilizing measured data to calculate the estimation of combined cycle-slip. Results show that the amount of STD of cycle-slip calculated by three-stage method is smaller than pseudorange/phase combination, ensuring the accuracy of cycle-slip detection and repair; and the workload of detection and repair of cycle-slip by three-stage method has been decreased compared with geometry-free phase combination.
收起
摘要 :
Link prediction has attracted a lot of attention in recent years. While most researchers try to find effective prediction methods, they ignore using the key users for friendship expansion. In this paper, we study the important fac...
展开
Link prediction has attracted a lot of attention in recent years. While most researchers try to find effective prediction methods, they ignore using the key users for friendship expansion. In this paper, we study the important factors that affect people's decision for building friendship, and quantify the relationship strength between users and their friends, based on which we build an overall hierarchical network. From the network, features are extracted to measure the closeness and difference between users, which are employed in supervised learning with classical classifiers for link prediction. Experimental results show that our proposed method substantially outperforms existing unsupervised link prediction methods in terms of AUROC (area under roc curve).
收起
摘要 :
Link prediction has attracted a lot of attention in recent years. While most researchers try to find effective prediction methods, they ignore using the key users for friendship expansion. In this paper, we study the important fac...
展开
Link prediction has attracted a lot of attention in recent years. While most researchers try to find effective prediction methods, they ignore using the key users for friendship expansion. In this paper, we study the important factors that affect people's decision for building friendship, and quantify the relationship strength between users and their friends, based on which we build an overall hierarchical network. From the network, features are extracted to measure the closeness and difference between users, which are employed in supervised learning with classical classifiers for link prediction. Experimental results show that our proposed method substantially outperforms existing unsupervised link prediction methods in terms of AUROC (area under roc curve).
收起
摘要 :
Link prediction has attracted a lot of attention in recent years. While most researchers try to find effective prediction methods, they ignore using the key users for friendship expansion. In this paper, we study the important fac...
展开
Link prediction has attracted a lot of attention in recent years. While most researchers try to find effective prediction methods, they ignore using the key users for friendship expansion. In this paper, we study the important factors that affect people’s decision for building friendship, and quantify the relationship strength between users and their friends, based on which we build an overall hierarchical network. From the network, features are extracted to measure the closeness and difference between users, which are employed in supervised learning with classical classifiers for link prediction. Experimental results show that our proposed method substantially outperforms existing unsupervised link prediction methods in terms of AUROC (area under roc curve).
收起
摘要 :
Link prediction has attracted a lot of attention in recent years. While most researchers try to find effective prediction methods, they ignore using the key users for friendship expansion. In this paper, we study the important fac...
展开
Link prediction has attracted a lot of attention in recent years. While most researchers try to find effective prediction methods, they ignore using the key users for friendship expansion. In this paper, we study the important factors that affect people's decision for building friendship, and quantify the relationship strength between users and their friends, based on which we build an overall hierarchical network. From the network, features are extracted to measure the closeness and difference between users, which are employed in supervised learning with classical classifiers for link prediction. Experimental results show that our proposed method substantially outperforms existing unsupervised link prediction methods in terms of AUROC (area under roc curve).
收起
摘要 :
Compared with the qualitative analysis, the quantitative analysis of VTEC (vertical total electron content) is more conducive to the modeling, forecasting, and correcting of ionosphere delay. With Empirical Mode Decomposition (EMD...
展开
Compared with the qualitative analysis, the quantitative analysis of VTEC (vertical total electron content) is more conducive to the modeling, forecasting, and correcting of ionosphere delay. With Empirical Mode Decomposition (EMD), the mean nighttime VTEC are investigated qualitatively using global grid ionospheric data from IGS (the International GPS Service for Geodynamic). The evaluation is done in this paper by comparing GIMs data with the quantitative analysis result and Klobuchar model. The final result indicates that the quantitative analysis result is reliable to describe and forecast the trend of the global mean nighttime VTEC in a solar activity cycle.
收起